-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THREAD_DUMPS_ON_SLOW_REQUEST: adding the base of the development of t… #558
THREAD_DUMPS_ON_SLOW_REQUEST: adding the base of the development of t… #558
Conversation
The slow request records component already includes thread dumps, at least for the thread handling the HTTP request (and IIRC for any thread it is blocked on as well); what additional value comes from this? I did not follow the motivation in Jira. |
Hi @jglick, thank you so much for your comment. As you said, the SlowRequest indeed captures the request thread. However, under my experience, making a performance analysis could be very tricky if we only have that information (the threads of the requests that hit the Slow Request feature). If the performance problem is locally located at the level of the feature that we are consuming with this request (for example if we are running a script in the Script Console that is consuming a lot of time), the thread of the request could be enough. I know that we have the complete thread dump in the High CPU checker. However, sometimes (and this is very common indeed), the performance degradation is not link to a high CPU usage scenario and this collection is not triggered. In this moment, you could ask why we would need to record this information if we can take the set of thread dumps manually during the slowness episode. The answer is simple: Jenkins is usually used to run periodic tasks and automatic tasks when nobody is on the site at the moment of the performance degradation. Furthermore, with the introduction of Jenkins in K8s and the fact that the readiness and liveness probes restart the pod under this kind of situation, it is very complex to perform a "post-morten" analysis once the pod is restarted. Kind regards, |
Just curious. Leaving review to @Dohbedoh. |
src/main/java/com/cloudbees/jenkins/support/slowrequest/SlowRequestThreadDumpsGenerator.java
Outdated
Show resolved
Hide resolved
src/main/java/com/cloudbees/jenkins/support/slowrequest/SlowRequestThreadDumpsGenerator.java
Outdated
Show resolved
Hide resolved
* Provide a means to disable the slow request thread dump checker. This is a volatile non-final field as if you run into | ||
* issues in a running Jenkins you may need to disable without restarting Jenkins. | ||
*/ | ||
public static volatile boolean DISABLED = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to make it volatile
here ? Seems to me that only a single instance of this threqad shall be running but maybe I am missing something 🤔 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right. I just revisited the code, and as you said, we don't need the volatile
attribute here. The expectation is that only one thread will be running simultaneously. I will apply the change in my next commit.
for (int i = 0; i < TOTAL_ITERATIONS; i++) { | ||
File threadDumpFile = logs.file(format.format(new Date(fileNameDate)) + ".txt"); | ||
try (FileOutputStream fileOutputStream = new FileOutputStream(threadDumpFile)) { | ||
ThreadDumps.threadDump(fileOutputStream); | ||
logs.add(threadDumpFile); | ||
} catch (IOException ioe) { | ||
LOGGER.log( | ||
WARNING, | ||
"Support Core plugin can't generate automatically thread dumps under SlowRequest scenario", | ||
ioe); | ||
} finally { | ||
fileNameDate += FREQUENCY_SEC * 1000; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm maybe I am missing something here, but we never actually sleep or reschedule anything here, so we are actually generating TOTAL_ITERATIONS
thread dumps with different file names in a loop, not every FREQUENCY_SEC * 1000
seconds ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah definitely generating dump without the 5 sec interval:
% ll /tmp/je-1-home/slow-request-threaddumps
total 436
drwxr-xr-x 6 jenkins wheel 192 Jul 19 19:58 .
drwxr-xr-x 53 jenkins wheel 1696 Jul 19 19:56 ..
-rw-r--r-- 1 jenkins wheel 120211 Jul 19 19:58 20240719-195806.303.txt
-rw-r--r-- 1 jenkins wheel 105633 Jul 19 19:58 20240719-195811.303.txt
-rw-r--r-- 1 jenkins wheel 106723 Jul 19 19:58 20240719-195816.303.txt
-rw-r--r-- 1 jenkins wheel 105673 Jul 19 19:58 20240719-195821.303.txt
% stat /tmp/je-1-home/slow-request-threaddumps/20240719-195806.303.txt
16777233 9837190 -rw-r--r-- 1 jenkins wheel 0 120211 "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" 4096 240 0 /tmp/je-1-home/slow-request-threaddumps/20240719-195806.303.txt
% stat /tmp/je-1-home/slow-request-threaddumps/20240719-195811.303.txt
16777233 9837191 -rw-r--r-- 1 jenkins wheel 0 105633 "Jul 19 19:58:07 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" 4096 208 0 /tmp/je-1-home/slow-request-threaddumps/20240719-195811.303.txt
% stat /tmp/je-1-home/slow-request-threaddumps/20240719-195816.303.txt
16777233 9837194 -rw-r--r-- 1 jenkins wheel 0 106723 "Jul 19 19:58:07 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" 4096 216 0 /tmp/je-1-home/slow-request-threaddumps/20240719-195816.303.txt
% stat /tmp/je-1-home/slow-request-threaddumps/20240719-195821.303.txt
16777233 9837195 -rw-r--r-- 1 jenkins wheel 0 105673 "Jul 19 19:58:08 2024" "Jul 19 19:58:07 2024" "Jul 19 19:58:07 2024" "Jul 19 19:58:06 2024" 4096 208 0 /tmp/je-1-home/slow-request-threaddumps/20240719-195821.303.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy sh.. You are right. I forgot to add the sleeping call to wait for this FREQUENCY_SEC * 1000
seconds. As you said, currently, we are creating all those files at the same time.
I will add the sleep
instruction in the finally
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some need to be addressed:
- the
SlowRequestThreadDumpsGenerator
actually generates thread dumps consecutively without intervals and give a misleading name to the files. - store the data under the
$JENKINS_HOME/support/
directory.
Another thing that bugs me is that if we tweak the settings, we could theoretically have several SlowRequestThreadDumpsGenerator
threads running (when iterations x frequency > recurrence period) and they would generate dumps. redundantly. And that make me feel like the generator should probably be a singleton that manage and launch the task.
…questThreadDumpsGenerator.java That is ok with me. I agree that we would need to move the slow requests and CPU folders into the support folder as well. Co-authored-by: Allan Burdajewicz <allan.burdajewicz@gmail.com>
…questThreadDumpsGenerator.java Co-authored-by: Allan Burdajewicz <allan.burdajewicz@gmail.com>
First of all, thank @Dohbedoh so much for your review. Regarding this last question, I trust your criteria better than mine. However, I would like to talk about the matter briefly before making the change. As you said, we could tune the variables to have two threads running simultaneously, and we should try to avoid it. However, I am afraid of making the class singleton. We are saving files during the thread execution and we are using the Java tool to do it. So, in case other threads affect the performance of those libraries (or it was blocking part of that code), would it not cause any kind of deadlock (or at least a performance degradation)? Knowing that the only benefit of the Kind regards, |
@ironcerocloudbees What I mean by singleton is a some kind of "Holder" for the generator Thread. To avoid creating new a |
Hi @Dohbedoh, Thank you again for your time here. I pushed a new version using a semaphore to avoid concurrencies. I have been reviewing different implementations of the Singleton for threads, and the semaphore looks a good option for us. If you could review it it would be great. Kind regards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. That looks good. Maybe an AtomicBoolean
is better suited for the running
. That'd be my only suggestion here.
private static boolean isRunning() { | ||
boolean runningLocal; | ||
synchronized (SlowRequestThreadDumpsGenerator.class) { | ||
runningLocal = running; | ||
} | ||
return runningLocal; | ||
} | ||
|
||
private static void setRunningStatus(boolean runningLocal) { | ||
synchronized (SlowRequestThreadDumpsGenerator.class) { | ||
running = runningLocal; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be simplified using AtomicBoolean
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Dohbedoh,
Thank you for the tip. I have never used AtomicBoolean, so I read the documentation about the topic, and I performed some changes to the PR. Could you please revisit it and confirm if the usage of AtomicBoolean is correct?
Kind regards,
src/main/java/com/cloudbees/jenkins/support/slowrequest/SlowRequestThreadDumpsGenerator.java
Show resolved
Hide resolved
…questThreadDumpsGenerator.java Co-authored-by: Allan Burdajewicz <allan.burdajewicz@gmail.com>
src/main/java/com/cloudbees/jenkins/support/slowrequest/SlowRequestThreadDumpsGenerator.java
Outdated
Show resolved
Hide resolved
…questThreadDumpsGenerator.java Co-authored-by: Jesse Glick <jglick@cloudbees.com>
…his feature
Testing done
I tested this feature installing the snapshot version into a testing environment (in the last
2.452.2
LTS version) and run the following tests:sleep(60000)
groovy code in theManage Jenkins -> Script Console
in multiples tabs at the same time (around 10).RECURRENCE_PERIOD_MIN
is working.SLOW_REQUEST_THREAD_DUMPS_TO_RETAIN
function works propertly.RECURRENCE_PERIOD_MIN
,MINIMAL_SLOW_REQUEST_COUNT
, etc) in the Jenkins startup to confirm that we can modify any of those parameters and the change affected the behavior of the new feature in the correct way.Proof of the feature working:
UI changes:
JIRA:
https://issues.jenkins.io/browse/JENKINS-73431
Submitter checklist